11 research outputs found
Additive models with shape constraints
In many practical situations when analyzing a dependence of one or more explanatory variables on a response variable it is essential to assume that the relationship of interest obeys certain shape constraints, such as monotonicity or monotonicity and convexity/concavity. In this thesis a new approach to shape preserving smoothing within generalized additive models has been developed. In contrast with previous quadratic programming based methods, the project develops intermediate rank penalized smoothers with shape constrained restrictions based on re-parameterized B-splines and penalties based on the P-spline ideas of Eilers and Marx (1996). Smoothing under monotonicity constraints and monotonicity together with convexity/concavity for univariate smooths; and smoothing of bivariate functions with monotonicity restrictions on both covariates and on only one of them are considered. The proposed shape constrained smoothing has been incorporated into generalized additive models with a mixture of unconstrained and shape restricted smooth terms (mono-GAM). A fitting procedure for mono-GAM is developed. Since a major challenge of any flexible regression method is its implementation in a computationally efficient and stable manner, issues such as convergence, rank deficiency of the working model matrix, initialization, and others have been thoroughly dealt with. A question about the limiting posterior distribution of the model parameters is solved, which allows us to construct Bayesian confidence intervals of the mono-GAM smooth terms by means of the delta method. The performance of these confidence intervals is examined by assessing realized coverage probabilities using simulation studies. The proposed modelling approach has been implemented in an R package monogam. The model setup is the same as in mgcv(gam) with the addition of shape constrained smooths. In order to be consistent with the unconstrained GAM, the package provides key functions similar to those associated with mgcv(gam). Performance and timing comparisons of mono-GAM with other alternative methods has been undertaken. The simulation studies show that the new method has practical advantages over the alternatives considered. Applications of mono-GAM to various data sets are presented which demonstrate its ability to model many practical situations.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Shape constrained additive models
A framework is presented for generalized additive modelling under shape constraints on the component functions of the linear predictor of the GAM. We represent shape constrained model components by mildly non-linear extensions of P-splines. Models can contain multiple shape constrained and unconstrained terms as well as shape constrained multi-dimensional smooths. The constraints considered are on the sign of the first or/and the second derivatives of the smooth terms. A key advantage of the approach is that it facilitates efficient estimation of smoothing parameters as an integral part of model estimation, via GCV or AIC, and numerically robust algorithms for this are presented. We also derive simulation free approximate Bayesian confidence intervals for the smooth components, which are shown to achieve close to nominal coverage probabilities. Applications are presented using real data examples including the risk of disease in relation to proximity to municipal incinerators and the association between air pollution and health
Smoothing parameter and model selection for general smooth models (with discussion)
This paper discusses a general framework for smoothing parameter estimation
for models with regular likelihoods constructed in terms of unknown smooth
functions of covariates. Gaussian random effects and parametric terms may also
be present. By construction the method is numerically stable and convergent,
and enables smoothing parameter uncertainty to be quantified. The latter
enables us to fix a well known problem with AIC for such models. The smooth
functions are represented by reduced rank spline like smoothers, with
associated quadratic penalties measuring function smoothness. Model estimation
is by penalized likelihood maximization, where the smoothing parameters
controlling the extent of penalization are estimated by Laplace approximate
marginal likelihood. The methods cover, for example, generalized additive
models for non-exponential family responses (for example beta, ordered
categorical, scaled t distribution, negative binomial and Tweedie
distributions), generalized additive models for location scale and shape (for
example two stage zero inflation models, and Gaussian location-scale models),
Cox proportional hazards models and multivariate additive models. The framework
reduces the implementation of new model classes to the coding of some standard
derivatives of the log likelihood
A comparison of inferential methods for highly non-linear state space models in ecology and epidemiology
Highly non-linear, chaotic or near chaotic, dynamic models are important in
fields such as ecology and epidemiology: for example, pest species and diseases
often display highly non-linear dynamics. However, such models are problematic
from the point of view of statistical inference. The defining feature of
chaotic and near chaotic systems is extreme sensitivity to small changes in
system states and parameters, and this can interfere with inference. There are
two main classes of methods for circumventing these difficulties: information
reduction approaches, such as Approximate Bayesian Computation or Synthetic
Likelihood and state space methods, such as Particle Markov chain Monte Carlo,
Iterated Filtering or Parameter Cascading. The purpose of this article is to
compare the methods, in order to reach conclusions about how to approach
inference with such models in practice. We show that neither class of methods
is universally superior to the other. We show that state space methods can
suffer multimodality problems in settings with low process noise or model
mis-specification, leading to bias toward stable dynamics and high process
noise. Information reduction methods avoid this problem but, under the correct
model and with sufficient process noise, state space methods lead to
substantially sharper inference than information reduction methods. More
practically, there are also differences in the tuning requirements of different
methods. Our overall conclusion is that model development and checking should
probably be performed using an information reduction method with low tuning
requirements, while for final inference it is likely to be better to switch to
a state space method, checking results against the information reduction
approach
A comparison of inferential methods for highly nonlinear state space models in ecology and epidemiology
Most of this work was undertaken at the University of Bath, where M.F. was a Ph.D. student, and it was supported in part by EPSRC Grants EP/I000917 and EP/K005251/1.Highly nonlinear, chaotic or near chaotic, dynamic models are important in fields such as ecology and epidemiology: for example, pest species and diseases often display highly nonlinear dynamics. However, such models are problematic from the point of view of statistical inference. The defining feature of chaotic and near chaotic systems is extreme sensitivity to small changes in system states and parameters, and this can interfere with inference. There are twomain classes ofmethods for circumventing these difficulties: information reduction approaches, such as Approximate Bayesian Computation or Synthetic Likelihood, and state space methods, such as Particle Markov chain Monte Carlo, Iterated Filtering or Parameter Cascading. The purpose of this article is to compare the methods in order to reach conclusions about how to approach inference with such models in practice. We show that neither class of methods is universally superior to the other. We show that state space methods can suffer multimodality problems in settings with low process noise or model misspecification, leading to bias toward stable dynamics and high process noise. Information reduction methods avoid this problem, but, under the correct model and with sufficient process noise, state space methods lead to substantially sharper inference than information reduction methods. More practically, there are also differences in the tuning requirements of different methods. Our overall conclusion is that model development and checking should probably be performed using an information reduction method with low tuning requirements, while for final inference it is likely to be better to switch to a state space method, checking results against the information reduction approach.Publisher PDFPeer reviewe
Incorporating shape constraints in generalized additive modelling of the height-diameter relationship for Norway spruce
Background: Measurements of tree heights and diameters are essential in forest assessment and modelling. Tree
heights are used for estimating timber volume, site index and other important variables related to forest growth and
yield, succession and carbon budget models. However, the diameter at breast height (dbh) can be more accurately
obtained and at lower cost, than total tree height. Hence, generalized height-diameter (h-d) models that predict tree
height from dbh, age and other covariates are needed. For a more flexible but biologically plausible estimation of
covariate effects we use shape constrained generalized additive models as an extension of existing h-d model
approaches. We use causal site parameters such as index of aridity to enhance the generality and causality of the
models and to enable predictions under projected changeable climatic conditions.
Methods: We develop unconstrained generalized additive models (GAM) and shape constrained generalized
additive models (SCAM) for investigating the possible effects of tree-specific parameters such as tree age, relative
diameter at breast height, and site-specific parameters such as index of aridity and sum of daily mean temperature
during vegetation period, on the h-d relationship of forests in Lower Saxony, Germany.
Results: Some of the derived effects, e.g. effects of age, index of aridity and sum of daily mean temperature have
significantly non-linear pattern. The need for using SCAM results from the fact that some of the model effects show
partially implausible patterns especially at the boundaries of data ranges. The derived model predicts monotonically
increasing levels of tree height with increasing age and temperature sum and decreasing aridity and social rank of a
tree within a stand. The definition of constraints leads only to marginal or minor decline in the model statistics like AIC.
An observed structured spatial trend in tree height is modelled via 2-dimensional surface fitting.
Conclusions: We demonstrate that the SCAM approach allows optimal regression modelling flexibility similar to the
standard GAM but with the additional possibility of defining specific constraints for the model effects. The
longitudinal character of the model allows for tree height imputation for the current status of forests but also for
future tree height prediction.
Keywords: Height-diameter curve, Norway spruce, Shape constrained additive models, Impact of climate change,
Varying coefficient model
Smoothing Parameter and Model Selection for General Smooth Models
<p>This article discusses a general framework for smoothing parameter estimation for models with regular likelihoods constructed in terms of unknown smooth functions of covariates. Gaussian random effects and parametric terms may also be present. By construction the method is numerically stable and convergent, and enables smoothing parameter uncertainty to be quantified. The latter enables us to fix a well known problem with AIC for such models, thereby improving the range of model selection tools available. The smooth functions are represented by reduced rank spline like smoothers, with associated quadratic penalties measuring function smoothness. Model estimation is by penalized likelihood maximization, where the smoothing parameters controlling the extent of penalization are estimated by Laplace approximate marginal likelihood. The methods cover, for example, generalized additive models for nonexponential family responses (e.g., beta, ordered categorical, scaled t distribution, negative binomial and Tweedie distributions), generalized additive models for location scale and shape (e.g., two stage zero inflation models, and Gaussian location-scale models), Cox proportional hazards models and multivariate additive models. The framework reduces the implementation of new model classes to the coding of some standard derivatives of the log-likelihood. Supplementary materials for this article are available online.</p